An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis

نویسندگان

  • Shinnosuke Takamichi
  • Tomoki Toda
  • Yoshinori Shiga
  • Hisashi Kawai
  • Sakriani Sakti
  • Satoshi Nakamura
چکیده

In this paper, we propose parameter generation methods using rich context models in HMM-based speech synthesis as yet another hybrid method combining HMM-based speech synthesis and unit selection synthesis. In the traditional HMM-based speech synthesis, generated speech parameters tend to be excessively smoothed and they cause muffled sounds in synthetic speech. To alleviate this problem, several hybrid methods have been proposed. Although they significantly improve quality of synthetic speech by directly using natural waveform segments, they usually lose flexibility in converting synthetic voice characteristics. In the proposed methods, rich context models representing individual acoustic parameter segments are reformed as GMMs and a speech parameter sequence is generated from them using the parameter generation algorithm based on the maximum likelihood criterion. Since a basic framework of the proposed methods is still the same as the traditional framework, the capability of flexibly modeling acoustic features remains. We conduct several experimental evaluations of the proposed methods from various perspectives. The experimental results demonstrate that the proposed methods yield significant improves in quality of synthetic speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvements to HMM-based speech synthesis based on parameter generation with rich context models

In this paper, we improve parameter generation with rich context models by modifying an initialization method and further apply it to both spectral and F0 components in HMM-based speech synthesis. To alleviate over-smoothing effects caused by the traditional parameter generation methods, we have previously proposed an iterative parameter generation method with rich context models. It has been r...

متن کامل

A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models

This paper presents a novel statistical sample-based approach for Gaussian Mixture Model (GMM)-based Voice Conversion (VC). Although GMM-based VC has the promising flexibility of model adaptation, quality in converted speech is significantly worse than that of natural speech. This paper addresses the problem of inaccurate modeling, which is one of the main reasons causing the quality degradatio...

متن کامل

Minimum generation error criterion for tree-based clustering of context dependent HMMs

Due to the inconsistency between HMM training and synthesis application in HMM-based speech synthesis, the minimum generation error (MGE) criterion had been proposed for HMM training. This paper continues to apply the MGE criterion for tree-based clustering of context dependent HMMs. As directly applying the MGE criterion results in an unacceptable computational cost, the parameter updating rul...

متن کامل

A speech parameter generation algorithm using local variance for HMM-based speech synthesis

This paper proposes a parameter generation algorithm using local variance (LV) constraint of spectral parameter trajectory for HMM-based speech synthesis. In the parameter generation process, we take account of both the HMM likelihood of speech feature vectors and a likelihood for LVs. To model LV precisely, we use dynamic features of LV with context-dependent HMMs. The objective experimental r...

متن کامل

Text-to-audio-visual speech synthesis based on parameter generation from HMM

This paper describes a technique for synthesizing auditory speech and lip motion from an arbitrary given text. The technique is an extension of the visual speech synthesis technique based on an algorithm for parameter generation from HMM with dynamic features. Audio and visual features of each speech unit are modeled by a single HMM. Since both audio and visual parameters are generated simultan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012